548 research outputs found

    Taxonomic corpus-based concept summary generation for document annotation.

    Get PDF
    Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches

    MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing

    Get PDF
    The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28, 000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh

    Automated annotation of chemical names in the literature with tunable accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation.</p> <p>Results</p> <p>An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems.</p> <p>Conclusions</p> <p>Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.</p

    PREDICT: a method for inferring novel drug indications with application to personalized medicine

    Get PDF
    The authors present a new method, PREDICT, for the large-scale prediction of drug indications, and demonstrate its use on both approved drugs and novel molecules. They also provide a proof-of-concept for its potential utility in predicting patient-specific medications

    Persistence of the immune response induced by BCG vaccination.

    Get PDF
    BACKGROUND: Although BCG vaccination is recommended in most countries of the world, little is known of the persistence of BCG-induced immune responses. As novel TB vaccines may be given to boost the immunity induced by neonatal BCG vaccination, evidence concerning the persistence of the BCG vaccine-induced response would help inform decisions about when such boosting would be most effective. METHODS: A randomised control study of UK adolescents was carried out to investigate persistence of BCG immune responses. Adolescents were tested for interferon-gamma (IFN-gamma) response to Mycobacterium tuberculosis purified protein derivative (M.tb PPD) in a whole blood assay before, 3 months, 12 months (n = 148) and 3 years (n = 19) after receiving teenage BCG vaccination or 14 years after receiving infant BCG vaccination (n = 16). RESULTS: A gradual reduction in magnitude of response was evident from 3 months to 1 year and from 1 year to 3 years following teenage vaccination, but responses 3 years after vaccination were still on average 6 times higher than before vaccination among vaccinees. Some individuals (11/86; 13%) failed to make a detectable antigen-specific response three months after vaccination, or lost the response after 1 (11/86; 13%) or 3 (3/19; 16%) years. IFN-gamma response to Ag85 was measured in a subgroup of adolescents and appeared to be better maintained with no decline from 3 to 12 months. A smaller group of adolescents were tested 14 years after receiving infant BCG vaccination and 13/16 (81%) made a detectable IFN-gamma response to M.tb PPD 14 years after infant vaccination as compared to 6/16 (38%) matched unvaccinated controls (p = 0.012); teenagers vaccinated in infancy were 19 times more likely to make an IFN-gamma response of > 500 pg/ml than unvaccinated teenagers. CONCLUSION: BCG vaccination in infancy and adolescence induces immunological memory to mycobacterial antigens that is still present and measurable for at least 14 years in the majority of vaccinees, although the magnitude of the peripheral blood response wanes from 3 months to 12 months and from 12 months to 3 years post vaccination. The data presented here suggest that because of such waning in the response there may be scope for boosting anti-tuberculous immunity in BCG vaccinated children anytime from 3 months post-vaccination. This supports the prime boost strategies being employed for some new TB vaccines currently under development

    Results of the seventh edition of the BioASQ Challenge

    Full text link
    The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.Comment: 17 pages, 2 figure

    Automatic construction of rule-based ICD-9-CM coding systems

    Get PDF
    Background: In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge. Results: Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26 % F measure on the training dataset and an 88.93 % F measure on the challenge test dataset, using the micro-averaged Fβ=1 measure, the official evaluatio

    Mapping data elements to terminological resources for integrating biomedical data sources

    Get PDF
    BACKGROUND: Data integration is a crucial task in the biomedical domain and integrating data sources is one approach to integrating data. Data elements (DEs) in particular play an important role in data integration. We combine schema- and instance-based approaches to mapping DEs to terminological resources in order to facilitate data sources integration. METHODS: We extracted DEs from eleven disparate biomedical sources. We compared these DEs to concepts and/or terms in biomedical controlled vocabularies and to reference DEs. We also exploited DE values to disambiguate underspecified DEs and to identify additional mappings. RESULTS: 82.5% of the 474 DEs studied are mapped to entries of a terminological resource and 74.7% of the whole set can be associated with reference DEs. Only 6.6% of the DEs had values that could be semantically typed. CONCLUSION: Our study suggests that the integration of biomedical sources can be achieved automatically with limited precision and largely facilitated by mapping DEs to terminological resources

    Locally critical quantum phase transitions in strongly correlated metals

    Full text link
    When a metal undergoes a continuous quantum phase transition, non-Fermi liquid behaviour arises near the critical point. It is standard to assume that all low-energy degrees of freedom induced by quantum criticality are spatially extended, corresponding to long-wavelength fluctuations of the order parameter. However, this picture has been contradicted by recent experiments on a prototype system: heavy fermion metals at a zero-temperature magnetic transition. In particular, neutron scattering from CeCu6x_{6-x}Aux_x has revealed anomalous dynamics at atomic length scales, leading to much debate as to the fate of the local moments in the quantum-critical regime. Here we report our theoretical finding of a locally critical quantum phase transition in a model of heavy fermions. The dynamics at the critical point are in agreement with experiment. We also argue that local criticality is a phenomenon of general relevance to strongly correlated metals, including doped Mott insulators.Comment: 20 pages, 3 figures; extended version, to appear in Natur
    corecore